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Previous simulations of the growth of cosmic structures have broadly reproduced the ‘cosmic web’ of galaxies that we 
see in the Universe, but failed to create a mixed population of elliptical and spiral galaxies, because of numerical in- 
accuracies and incomplete physical models. Moreover, they were unable to track the small-scale evolution of gas and 
stars to the present epoch within a representative portion of the Universe. Here we report a simulation that starts 12 
million years after the Big Bang, and traces 13 billion years of cosmic evolution with 12 billion resolution elements in a 
cube of 106.5 megaparsecs a side. It yields a reasonable population of ellipticals and spirals, reproduces the observed 
distribution of galaxies in clusters and characteristics of hydrogen on large scales, and at the same time matches the 


‘metal’ and hydrogen content of galaxies on small scales. 


The initial conditions for structure formation in the Universe are tightly 
constrained from measurements of anisotropies in the cosmic micro- 
wave background radiation’. However, previous attempts to reproduce 
the properties of the observed cosmological structures with computer 
models have shown only limited success. No single, self-consistent sim- 
ulation of the Universe was able to simultaneously predict statistics on 
large scales, such as the distribution of neutral hydrogen or the galaxy 
population of massive galaxy clusters, together with galaxy properties 
on small scales, such as the morphology and detailed gas and stellar con- 
tent of galaxies. The challenge lies in following the baryonic component 
of the Universe using hydrodynamic simulations” *, which are required 
to model gas, stars, supermassive black holes (SMBHs) and their related 
energetic feedback. The vast computational challenges of these simula- 
tions have forced previous attempts to focus either on simulating only 
small portions of the Universe or to employ a coarse resolution such that 
internal characteristics of galaxies could not be resolved. Furthermore, 
the large dynamic range of length scales in the problem forces common 
hydrodynamic solvers to sacrifice accuracy. Finally, poorly understood 
small-scale processes such as star formation and accretion onto SMBHs 
are coupled to galactic and super-galactic scales, introducing large uncer- 
tainties into modelling techniques. 

Rapid advances in computing power combined with improved nu- 
merical algorithms and more faithful models of the relevant physics 
have allowed us to produce a simulation (named Illustris) that simul- 
taneously follows the evolution of dark matter and baryons in detail. 
Starting approximately 12 million years (Myr) after the Big Bang, our 
simulation tracks the evolution of more than 12 billion resolution ele- 
ments in a volume of (106.5 Mpc)’ up to the current epoch (redshift 
z = 0). This allows us to achieve a dark-matter mass resolution of 6.26 X 
10° times that of the Sun (Mo), and a baryonic mass resolution of 
1.26 X 10°M«. The smallest scale over which the hydrodynamics is re- 
solved is 48 pc, whereas gravitational forces are resolved down to 710 pc 
atz = 0 and to even smaller scales at high redshifts (for example, 473 pc 
at z = 2). Our calculation therefore overcomes the problems of previous 
hydrodynamic simulations which either did not cover a large enough 
portion of the Universe to be representative, lacked adequate resolu- 
tion, or failed to reach the present epoch. Apart from having a large 


volume and improved resolution, our simulation is evolved with the novel 
hydrodynamic algorithm AREPO”, which uses a moving unstructured 
Voronoi tessellation in combination with a finite volume approach 
(Methods). Finally, we employ a numerically well-posed and reasonably 
complete model for galaxy formation physics, which includes the for- 
mation of both stars and SMBHs, and their effects on their environments 
in forms of galactic super-winds driven by star formation, as well as 
radio bubbles and radiation proximity effects caused by active galactic 
nuclei (AGNs; see Methods). 

Unlike previous attempts, we find a mix of galaxy morphologies 
ranging from blue spiral galaxies to red ellipticals, with a hydrogen and 
‘metal’ (that is, all elements other than hydrogen and helium) content 
in good agreement with observational data. At the same time, our model 
predicts correctly the large-scale distribution of neutral hydrogen, and 
the radial distribution of satellite galaxies within galaxy clusters. Our 
results therefore demonstrate that the A cold dark matter (ACDM) model 
can correctly describe the variety of observational data on small and 
large scales in our Universe. It also predicts a strong, scale-dependent 
impact of baryonic effects on the dark-matter distribution, at a level 
that has significant implications for future precision probes of cosmology. 


Observing the model Universe 


The simulation volume contains 41,416 galaxies at z = 0 that are re- 
solved with more than 500 stellar resolution elements. Our model yields 
a population of non-star-forming elliptical galaxies, star-forming disk 
galaxies, and irregular galaxies (Fig. 1a). We find that galaxies with low 
star-formation rates contain about 52% of all the stellar mass that is 
in galaxies more massive than M+ = 10°Mo, which agrees well with 
observations® (54%-60%). Simulating the formation of realistic disk 
galaxies, like our own Milky Way, has remained an unsolved problem 
for more than two decades. The culprit was an angular momentum deficit 
leading to too high central concentrations, overly massive bulges and 
unrealistic rotation curves’*. The fact that our calculation naturally pro- 
duces a morphological mix of realistic disk galaxies coexisting with a 
population of ellipticals resolves this long-standing issue. It also shows 
that previous futile attempts to achieve this were not due to an inherent 
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flaw of the ACDM paradigm, but rather due to limitations of numer- 
ical algorithms and physical modelling. 

As our simulation follows the evolution of galaxies starting rela- 
tively shortly after the Big Bang, we can construct virtual mock obser- 
vations that mimic the conditions that the Hubble Space Telescope 
encounters as it images galaxies across cosmic time in very deep surveys, 
such as the Ultra Deep Field (UDF), and the continuing Frontier Fields 
programme. These observations capture a large variety of galaxy lumi- 
nosities, sizes, colours, morphologies and evolutionary stages, provid- 
ing remarkable benchmarks for galaxy formation theories. We have 
constructed a mock UDF and compare it side by side with data from 
the HST eXtreme Deep Field (XDF) compilation” (Fig. 1b, c). Galaxies 
in the mock UDF appear strikingly similar to the observed population 
in terms of number density, colours, sizes and morphologies. Our model 
is the first hydrodynamic simulation from which a faithful deep UDF- 
like observation could be constructed, thanks to its combination of large 
volume and high resolution, along with the new numerical techniques 
that allow it to reproduce realistic galaxy morphologies. 


Satellite galaxies in clusters 

This qualitative agreement also extends to many quantitative probes 
of the distribution and internal structure of galaxies. The abundance 
and spatial distribution of satellite galaxies forms an important obser- 
vational test of the ACDM paradigm, as it is very sensitive both to details 
of the hierarchical structure formation process and to galactic-scale 
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Figure 1 | Mock images of the simulated galaxy population. a, Stellar light 
distributions (g, r, i bands) for a sample of galaxies at z = 0 arranged along the 
classical Hubble sequence for morphological classification. Our simulation 
produces a variety of galaxy types, ranging from ellipticals to disk galaxies to 
irregular systems, the last mostly resulting from interactions and mergers. 

b, HST UDF image (2.8 arcmin on a side) in B, Z and H bands convolved with 
Gaussian point-spread functions of o = 0.04, 0.08 and 0.16 arcsec, respectively. 
c, HST mock observation from Illustris. 
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baryonic processes. In fact, two of the most acute challenges to the ACDM 
modelare related to satellite galaxies on galactic scales, giving rise to the 
‘missing satellite’ and ‘too-big-to-fail’ problems". A particularly tax- 
ing problem is the radial distribution of satellite galaxies within galaxy 
clusters, which are the largest gravitationally bound objects in the Universe. 

Recently, large Sloan Digital Sky Survey cluster samples have made 
it possible to measure accurate radial satellite profiles’*’’. Theoretical 
models have so far struggled to reproduce the shapes and normaliza- 
tions of these profiles. Semi-analytic models have resorted to ad hoc 
prescriptions for mass-stripping and tuned approximations for satellite 
orbits because they are missing the gravitational effects of the stellar 
component. Despite the freedom offered by the adopted coarse para- 
meterizations, most semi-analytic models consistently find radial pro- 
files that are too steep, and in particular overestimate the number of 
satellites in the inner regions of clusters (radius r < 100-200 kpc)'*"’. 
Previous hydrodynamic simulations, on the other hand, found inner 
radial profiles that are too shallow and in most cases could not repro- 
duce the observed normalization'*“*, owing both to limited resolution 
and to an over-production of stars in satellites due to missing physical 
processes. 

For the most massive haloes, which host the largest accumulations 
of galaxies in the Universe, we have calculated a stacked projected gal- 
axy count profile and compared it to results from a sample" of satellite 
galaxies with r-band magnitude <—20.5, for host systems at 0.15 =z 
< 0.4 extracted from 55,121 Sloan Digital Sky Survey groups and clus- 
ters (107 Mo < Msoo,crit < 10° Mo, where M5oo,crit is the character- 
istic mass enclosed in a radius with mean density 500 times the critical 
density for closure) centred on luminous red galaxies (Fig. 2). Having 
higher resolution than previous hydrodynamic simulations, and more 
realistic feedback models that suppress the stellar masses of satellites, 
we obtain a good agreement with both the observed profile shape and 
its normalization, as well as with the mean galaxy colour as a function 
of cluster-centric distance”’. The radial distribution of dark-matter sub- 
haloes from a corresponding dark-matter-only simulation (‘Illustris- 
Dark’) flattens towards the centre. This difference is due to dissipational 
processes of galaxy formation that make the stellar component more 
resistant to tidal disruption close to cluster centres. This directly dem- 
onstrates that neglecting baryonic physics causes inaccuracies in the 
spatial distribution of satellite galaxies which in turn can lead to errors 
in estimates of galaxy merger rates. 


Metals and neutral hydrogen in galaxies 


The high mass resolution of the simulation makes it possible to study 
the internal characteristics of galaxies. It allows us, for example, to make 
clear predictions for the stellar and gaseous contents of galaxies, and 
for the baryonic cycle operating between them. Our model predicts the 
present-day H 1 mass, Mps of galaxies in the local Universe, which can 
be compared to observations as revealed by the Arecibo Legacy FAST 
ALFA Survey (ALFALFA; Fig. 3a). As stars form out of cold gas which 
is largely neutral, the H1 richness is a good probe of the reservoir of 
gas available for star formation in a galaxy. The H I-selected samples of 
ALFALFA are typically biased towards the most gas-rich star-forming 
galaxies; that is, the major limitation of H 1-selected samples is that they 
will miss the most gas-poor elliptical galaxies. 

Our results recover the trend of decreasing H1 richness with in- 
creasing stellar mass. For the most massive galaxies, the full simulated 
galaxy sample deviates from the observed mean relation. Many of these 
massive galaxies have been quenched through AGN feedback and there- 
fore contain little amounts of gas; that is, they would not be detected by 
ALFALFA given their low H 1 content. If we focus instead on star-forming 
galaxies, we find significantly better agreement even at higher masses, 
and we reproduce the observed H 1 richness relation over nearly four orders 
of magnitude in stellar mass. We also include a separate H 1 richness re- 
lation for the satellites of the most massive cluster and compare this 
to H1 measurements of galaxies in the Virgo cluster” where ram pres- 
sure removes gas from infalling galaxies (Fig. 3a). Such hydrodynamic 
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Figure 2 | Projected number density profile of satellite galaxies in 

galaxy clusters. Symbols show a simulation sample as a function of 
cluster-centric projected radius, considering 10 haloes more massive than 
Msoo,crit = 10'*’Mo and all their satellite galaxies brighter than r < —20.5 mag 
within 3 X Rsoo,crit along the line of sight from the cluster centre. bg is the 
projected galaxy number density, r, is the projected distance from the halo 
centre and <fsoo,crit> is the mean radius, which encloses Msoo,crit- The symbol 
colour reflects the average (g — r) colour for galaxies at that distance (see key). 
A reddening towards the cluster centre is clearly visible. Observational data? 
are shown for comparison (green line), while the dashed line gives the subhalo 
profile of the corresponding dark-matter (DM)-only simulation (normalized 
to match the satellite profile at large radii). The left blue region marks the area 
of incompleteness in the observed number density profiles due to obscuration 
from the brightest cluster galaxy (BCG). The green shaded region marks the 
observational uncertainties: outside the region of contamination of the BCG 
the scatter within the observational stack dominates, whereas Poisson errors 
are shown within that region. 


processes are not directly accessible to semi-analytic models but are cap- 
tured in our simulation. We can therefore recover the observed trend: 
satellites in cluster environments have a lower H 1 content compared to 
the whole population. The predicted difference is not as pronounced as 
in the observations, but this may well be caused by the fact that our volume 
does not contain a galaxy cluster as massive as Virgo. 

Every dynamical time, a small fraction of the galactic cold neutral 
gas turns into stars, which inherit the chemical composition of the gas. 
Stellar metallicities therefore probe different processes intrinsic to gal- 
axy formation: chemical enrichment, feedback, and in- and outflows. 
Hydrodynamic cosmological simulations have so far been unable to 
produce correct stellar metallicities; measures such as the mean stellar 
metallicity or the cosmic density of total metal mass locked up in stars, 
pz» were discrepant from the observed values by almost a factor of 
two?™??, Our model, on the other hand, produces pz» = 7.75 X 10°Mo 
Mpc *, in agreement with the observed value” of pz»+= (7.1 + 2) X 
10ĉMo Mpc `°. 

Recent observations find a relation where more massive galaxies con- 
tain a larger proportion of metals****. Our predictions agree well with 
these observations, and, most importantly, the simulation recovers the 
flattening of the relation above Mx ~ 10'°Me (Fig. 3b). The observed 
trend is reproduced over nearly five orders of magnitude in stellar mass. 
These agreements with observations are driven by the combination of 
our accurate hydrodynamic scheme together with realistic effective feed- 
back models and a complete stellar evolution prescription, a combina- 
tion that was lacking in previous studies. 
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Figure 3 | Neutral hydrogen and metal content of galaxies as a function of 
their stellar mass. a, Predicted H1 richness of galaxies (Mp is the mass in 
neutral hydrogen) compared to the «.40 sample of the ALFALFA survey. Filled 
regions mark bins with more than 20 galaxies (red, ALFALFA; green, all 
galaxies; blue, star-forming galaxies). Error bars indicate s.e.m. for individual 
galaxies. A separate H 1 richness relation for the satellites of the most massive 
cluster is included and compared to H 1 measurements of Virgo satellites”? to 
demonstrate the strong effect of environment on the H1 content of satellites. 
b, Stellar metal content of simulated galaxies, Z, in units of solar metallicity, 
Zo (blackline and grey shaded squares), compared to observations?” (green). 
Reassuringly, the trend of the median relation including the flattening above the 
stellar mass M= ~ 10''Mo is well recovered. The green shaded region 
represents the s.d. The error bars represent the s.e.m. for individual galaxies. 


Large-scale characteristics of neutral hydrogen 

The space between galaxies, the intergalactic medium, is filled with low- 
density, warm gas, mainly composed of ionized hydrogen. Observa- 
tionally, the intergalactic medium can be probed through the forest of 
Lyman-« absorption lines in quasar spectra. The corresponding dis- 
tribution of hydrogen column densities is well-constrained*””* over 
a wide dynamic range. At high gas densities, hydrogen becomes self- 
shielded from the ionizing ultraviolet background and forms dense 
neutral clouds with a distinctive spectral absorption signature, the so- 
called damped Lyman-z absorbers (DLAs), which are observed mostly 
at redshift z = 2-5 and probe initial stages of galaxy formation. 
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Early hydrodynamic simulations successfully predicted the low- 
column-density statistics of the Lyman-« forest”’, whereas describing 
the main properties of higher-density absorbers and DLAs correctly only 
became feasible over the past few years*”*’. However, recent numerical 
calculations have failed to explain the metallicity distribution of DLAs, 
which is now reasonably well constrained by data’. The outcomes 
of most simulations gave metallicities for DLAs that are too high or 
yielded a metallicity distribution that is too broad, in disagreement with 
observations****, 

We contrast our predicted H 1 column density distribution function 
(CDDF; Fig. 4a) and the metal content of DLAs (Fig. 4b) with observations. 
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Figure 4 | Large-scale characteristics of neutral hydrogen. a, Column density 
distribution function (CDDF), f(Niy), of neutral hydrogen at z = 3 compared to 
observations”**”. The dashed vertical line shows the density threshold 
separating damped Lyman-o systems (DLAs) and Lyman limit systems. The 
shaded region shows an estimate for the CDDF constrained by the assumption 
of a power law fit and the observed incidence of Lyman limit systems. The 
vertical error bars represent the s.e.m., and the horizontal ones represent the 
binning. b, Probability density function of the DLA metallicity in units of solar 
metallicity at z = 3 compared to observational findings’. The observational 
vertical error bars show the s.e.m. derived from the number of observed spectra 
in each metallicity bin. Bin widths have been chosen to be larger than the 
maximal uncertainty in each individual metallicity measurement (horizontal 
error bars). 
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For the DLA metallicities, we compare to an observational compendium”, 
based on all available quasars between z = 2 and z = 4, whereas the CDDF 
data are centred at z = 3. The properties of DLAs are sensitive to the 
balance between gas accretion, outflows and ionization. Probing such 
a complex interplay of processes is a particular strength of hydrodyn- 
amic simulations. The prediction of our model is in remarkable agree- 
ment with the observational data, reproducing in detail the shape and 
location of the metallicity peak. This success is a result of accurate hy- 
drodynamics and modelling of galactic super-winds resulting in metal 
outflows. Also, we find good agreement between our theoretical pre- 
dictions and the observed CDDF. 

At low redshift, most of the baryons have not yet been detected. Im- 
proved instrumental sensitivity may lead to their detection in the fore- 
seeable future”, at a level depending on exactly where they reside, and 
at what temperature, which is a matter of theoretical debate. On the 
basis of our model we predict that, at z = 0, gas that is not bound to 
any haloes constitutes 81% of the total baryons in the Universe, but 
contains only 34% of the heavy metals. 


The impact of baryons on dark matter 
The discussion thus far has stressed that a direct modelling of baryonic 
physics is essential to soundly connect cosmological predictions to gal- 
axy observations. Even beyond that, baryonic processes can actually 
affect and modify the dark-matter distribution, and therefore alter the 
matter power spectrum P(k). Measuring P(k) as a function of wave- 
number, k, provides a powerful cosmological probe, because theoret- 
ical models can be used to connect the present-day P(k) to the initial 
power spectrum. Such measurements come, for example, from weak 
lensing studies, galaxy clustering surveys and the analysis of the Lyman- 
a forest. However, the interpretation of upcoming weak lensing surveys 
such as EUCLID, which will measure P(k) on scales of 0.1h Mpc! <k 
< 10h Mpc ', must consider the impact of baryons in order to achieve 
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Figure 5 | Nonlinear matter power spectrum. a, The dimensionless total 
matter power spectrum, A’(k), of the Illustris simulation (black line) differs 
significantly, owing to baryonic effects, from that of the dark-matter-only 
counterpart Illustris-Dark (light blue). Analytic fitting models’ (green and 
pink) do not provide an adequate description of the hydrodynamic results. The 
theoretical shot noise level (shown as thin dashed lines) has been subtracted in 
the measurements. b, The relative differences between Illustris and Illustris- 
Dark highlight the fact that baryonic effects already exceed 1% on scales smaller 
than k= 1h Mpc". Dashed black lines mark 1%, 10% and 100% relative 
differences. 
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the required level of ~1% accuracy**’”. This effect has often been ne- 
glected, even though it is not restricted to just the smallest spatial scales. 
We have calculated the dimensionless matter power spectrum, A(k) 
= kP(k)/(20°), along with the corresponding dark-matter-only result 
(Fig. 5). The dynamic range of our simulation allows us to probe P(k) 
on a wider range of scales than possible before through a single hy- 
drodynamic simulation. On scales smaller than k ~ 1h Mpc", AGN- 
driven outflows reduce the total power in a scale-dependent way by 
up to ~30%-40% compared to the dark-matter-only prediction. This 
impact is even larger than found in previous studies*, which is most 
probably related to the strong radio-mode AGN feedback in our simu- 
lation needed to match the stellar mass content of massive haloes. On 
smaller scales, gas cooling processes become important and enhance 
the power on scales smaller than k ~ 100h Mpc_'. Here, the power spec- 
trum can deviate by a factor of a few compared to collisionless results. 
Measuring this fundamental statistic precisely hence requires high- 
resolution hydrodynamic simulations in large volumes; it cannot be 
done by dark-matter-only simulations or semi-analytic modelling. We 
also present in Fig. 5 the predictions of commonly employed empirical 
fitting models for the nonlinear evolution?’ of P(k). Those models have 
been calibrated on dark-matter-only simulations and clearly fail to des- 
cribe the results of full hydrodynamic calculations, rendering their use 
impractical for the precision requirements of upcoming surveys. 


Looking ahead 


Although our simulation provides a significant step forward in mod- 
elling galaxy formation by reproducing simultaneously many dispar- 
ate observations on large and small scales, there are still outstanding 
problems. One such problem lies in the formation of low-mass galaxies: 
our simulation tends to build up the stellar mass of low-mass galaxies 
below Mx = 10'°M 5 too early, resulting in stellar populations that are 
too old, with mean ages a factor of two to three larger than observed. 
This tension is shared by semi-analytic models and other recent hydro- 
dynamic simulations alike, pointing towards an open problem in low- 
mass galaxy formation*’. It remains to be seen whether new stellar 
feedback models that, for example, include effects of radiation pres- 
sure on dust can resolve this issue”. It will clearly be challenging to test 
new schemes that also directly treat the stellar radiation fields with sta- 
tistically meaningful samples of galaxies, as this poses extremely high 
computational demands that go significantly beyond what was achieved 
in the present work. Nevertheless, such new generations of large-scale 
high-resolution hydrodynamic simulations might become feasible with- 
in the next decade. 


METHODS SUMMARY 


The equations of gravity and hydrodynamics are evolved using the moving-mesh 
code AREPO® combined with a galaxy formation model which includes®: gas cool- 
ing with radiative self-shielding corrections; star formation; energetic feedback from 
growing SMBHs and exploding supernovae; stellar evolution with associated chem- 
ical enrichment and stellar mass loss; and radiation proximity effects for AGNs. 
Our AGN feedback consists of three components: thermal quasar-mode feedback; 
thermal-mechanical radio-mode feedback; and radiative feedback. The efficiency 
of physical processes below the resolution scale are treated in parameterized form 
and have been calibrated to reproduce the observed global efficiency of star forma- 
tion. Our simulation assumes a ACDM cosmology with the parameters Qm = 0.2726, 
Q = 0.7274, Q, = 0.0456, og = 0.809, n, = 0.963 and Hy = 100hkms | Mpc ', 
with h = 0.704. Here Qn, Qa, and Q are respectively the mass/energy density 
contributions from matter, the cosmological constant and baryons, h encodes the 
Hubble expansion rate at redshift zero, n, is the spectral index of the primordial 
power spectrum and gg is the root mean squared amplitude of mass fluctuations 
in 8h_' Mpc spheres linearly extrapolated to redshift zero. Initial conditions are 
generated at z = 127 ina periodic box witha side length of 75h ' Mpc = 106.5 Mpc. 
The initial gas temperature at z = 127 is set to 245 K. We have also performed a 
second simulation, Illustris-Dark, which does not include baryons and the related 
feedback, but is otherwise identical to Illustris. Dark-matter haloes were identified 
in an on-the-fly manner during the simulation for each snapshot using a friends- 
of-friends (FOF) algorithm with a linking length of 0.2 times the mean particle 
separation. The minimum dark-matter particle number was set to 32 for the FOF 
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identification. Non-dark-matter particles are attached to these FOF primaries in a 
secondary linking stage“*. Subsequently, gravitationally bound substructures are 
identified using the SUBFIND algorithm**’. 16 million CPU hours were needed 
to evolve the simulation from the starting redshift z = 127 to z = 0, using 8,192 
cores and an equal number of MPI-ranks. An additional 3 million CPU hours were 
spent on carrying out the on-the-fly galaxy identification with the SUBFIND algorithm. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Simulation code. Our simulation code, AREPO, uses an unstructured Voronoi 
tessellation of the simulation volume, where the mesh-generating points of this 
tessellation are moved with the gas flow. The adaptive mesh is used to solve the 
equations of ideal hydrodynamics with a finite volume approach using a second- 
order unsplit Godunov scheme with an exact Riemann solver. This approach is 
under most circumstances superior to traditional smoothed particle hydrodynamics 
(SPH), and also to Eulerian adaptive mesh refinement (AMR)***'. This scheme 
naturally produces extended disk galaxies without invoking extreme forms of stellar 
feedback or star formation”, which was a major problem of previous galaxy forma- 
tion simulations. The gravity calculation employs a Tree-PM scheme’, where long- 
range forces are determined with a particle-mesh method (PM) while short-range 
forces are computed via a hierarchical tree algorithm”. 

Galaxy formation physics. Our simulation accounts for a variety of astrophysical 
processes known to be relevant for galaxy formation”. 

Gas cooling rates are calculated as a function of gas density, temperature, metal- 
licity, the radiation fields of active galactic nuclei and the spatially uniform but 
time-dependent ionizing background radiation from galaxies and quasars*’, which 
completes H1 reionization at a redshift of z ~ 6. We use a self-consistent calcula- 
tion of the primordial cooling and complement it with the cooling contribution of 
the metals, based on pre-calculated cooling rate tables using CLOUDY™. All cool- 
ing rates include self-shielding corrections”. 

We employ a sub-resolution model of the interstellar medium to achieve nu- 
merical closure below our resolution scale. Gas with hydrogen number density above 
0.13 cm” * follows an effective equation of state with a stochastic prescription for 
star formation following the Kennicutt-Schmidt law” and adopting a Chabrier 
initial mass function’. The effective equation of state assumes that the interstellar 
medium has a two-phase structure that is predominantly composed of cold clouds 
embedded in a tenuous, supernova-heated phase™. 

Once stellar populations are born, they can lose mass, for example through stellar 
winds or supernovae. This mass is returned to the gas phase and enriches the gas 
surrounding stellar populations. We track the evolution of stars and model super- 
novae of type Ia, type II and the asymptotic giant branch phases of stars. We trace 
the evolution of nine elements in total (H, He, C, N, O, Ne, Mg, Si, Fe), each ad- 
vected as a passive scalar. 

Stellar feedback is realized through a kinetic wind scheme with a velocity scaling 
based on the local one-dimensional dark-matter velocity dispersion (3.70pm,1p)> 
anda mass loading inferred from energy conservation assuming 1.09 X 10°! erg per 
type II supernova. We use a sub-grid metal-loading scheme that regulates the degree 
of wind enrichment such that 40% of the local interstellar medium metals are ejected 
by supernova-driven galactic winds. This is required to simultaneously reproduce 
the stellar mass content of low-mass haloes and their gas oxygen abundances”. 

We include procedures for supermassive black hole (SMBH) seeding, accretion 
and merging”. Feedback from SMBHs operates in either a quasar-mode or a radio- 
mode, depending on their accretion rate®. In addition, a prescription for radiative 
SMBH feedback is included that modifies the ionization state and hence the net 
cooling rate of nearby gas. SMBHs are seeded in friends-of-friends groups more 
massive than 7.1 X 10'°M. with a seed mass of 1.4 X 10°Mo. 

The free parameters of our model are set to physically plausible values and have 
been adjusted within the allowed range to roughly reproduce the relation between 
mean stellar mass and halo mass inferred from abundance matching analysis. The 
resulting parameter settings have been tested on smaller-scale simulations’ and 
high-resolution zoom-in simulations of individual Milky Way-like haloes™. 
Initial conditions. To create initial conditions we use the Boltzmann code 
CAMB*** to compute the linear power spectrum of a ACDM cosmology with 
the parameters Qm = 0.2726, Q4 = 0.7274, Q, = 0.0456, og = 0.809, n, = 0.963 
and Ho = 100hkms~' Mpc ' with h = 0.704. These parameters are consistent with 
the latest Wilkinson Microwave Anisotropy Probe (WMAP)-9 measurements”, 
but slightly offset from the first year results of the Planck mission’. However, a recent 
re-analysis of the Planck data found parameters more consistent with pre-Planck 
cosmic microwave background analyses and astronomical observations®. 

We create a random realization of this cosmology in periodic boxes with a side 
length of 75h”! Mpc = 106.5 Mpc, starting from an initial ‘glass-like’ particle con- 
figuration” composed of one thousand 182° particle tiles. We employ a 3,640? fast 
Fourier transform to calculate the displacement field and use Lagrangian perturba- 
tion theory (Zel'dovich approximation”) to displace particles, and we de-convolve 
the input power spectrum for smoothing effects due to the interpolation off this 
grid. Initial conditions are generated at z = 127 with mesh-generating points added 
to the initial conditions by splitting each original particle into a dark matter and gas 
cell pair, displacing them with respect to one another such that two interleaved 
grids are formed, keeping the centre-of-mass of each pair fixed. The initial gas tem- 
perature at z = 127 is set to 245 K based on a RECFAST”” calculation. We have 
generated 100 different random fields and inspected their power spectra and mass 
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functions at z = 0 to make sure that we do not simulate an unusual or extreme den- 
sity field that is dominated by a few large clusters or voids due to cosmic variance. 
Simulation details. The simulation volume contains initially 6,028,568,000 hydro- 
dynamic cells and the same number of dark-matter particles resulting in a dark- 
matter mass resolution of 6.26 X 10°M, anda baryonic mass resolution of 1.26 X 
10°Mo. The gravitational softening length of dark-matter particles is fixed in comov- 
ing coordinates (£pm = 1h kpc). For baryonic particles (stars and SMBHs), we 
limit the softening length to a maximum physical scale (éparyon = 0.5h j kpc). Gas 
cells use an adaptive softening length tied to their cell radius with a floor given by 
the softening length of the collisionless baryonic particles. We employ a (de-) re- 
finement scheme which keeps the cell masses typically within a factor of two of a 
specified target mass set to 1.26 X 10°Mo, anda regularization scheme steering the 
mesh towards a computationally efficient centroidal configuration’**“*. The smal- 
lest cells in Illustris have a typical extent of 48 pc. For the least massive cells we 
achieve a mass resolution of 1.5 X 10*Mo. 

Galaxy identification. Dark-matter haloes were identified in an on-the-fly man- 
ner during the simulation for each snapshot using a friends-of-friends (FOF) 
algorithm” with a linking length of 0.2 times the mean particle separation. The 
minimum dark-matter particle number was set to 32 for the FOF identification. 
Non-dark-matter particles are attached to these FOF primaries in a secondary link- 
ing stage“*. Subsequently, gravitationally bound substructures are identified using 
the SUBFIND algorithm****. We derive the various galaxy properties from the grav- 
itationally bound mass that is contained within a radius r« that equals twice the 
stellar half-mass radius of each SUBFIND (sub)halo. Using this definition, the ga- 
lactic stellar mass does not significantly differ from the total stellar mass for low- 
mass systems, but some of the intra-cluster light for massive systems is excluded. 
We have checked this definition against surface brightness cuts in different bands 
and find it to give similar results to more elaborate methods for excluding intra- 
cluster light. We have extended this radius by 50% for measuring the galactic H1 
masses. Stellar population synthesis models” were used to associate the stars in our 
simulation with observable broad band luminosities. Producing stellar images of 
galaxies requires assigning colour values to specific bands. Specifically, we make an 
RGB mapping of the (g,r,i) bands using a commonly employed asinh scaling”®. 
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